October 19, 2023

Hello!

I am Emily Josephs

  • interested in: evolutionary genetics, plants, triathlons, cat

Hello!

I am Emily Josephs

  • interested in: evolutionary genetics, plants, triathlons, cat

1st semester goals

  1. Data into spreadsheet

  2. Wrangle data into R

  3. Visualize data

  4. Build a model

    • how are data distributed?
    • relationship between predictor and response?
  5. Use algorithm to parameterize model

  6. Interpret results

1st semester goals

  1. Data into spreadsheet

  2. Wrangle data into R

  3. Visualize data

  4. Build a model

    • how are data distributed?
    • relationship between predictor and response?
  5. Use algorithm to parameterize model

  6. Interpret results

IBio 830, part II

Covering:

  • probability
  • probability distributions
  • likelihood
  • likelihood-based inference
  • Bayesian inference
  • deterministic functions
  • model building

Still using R, but focus won’t be on the language itself!

Shaping Expectations

This part of the course may feel like it moves a little faster.

There will be math!

You will be learning a new skill, using a young skill, which is hard!

The material will build on itself.

I am learning along with you!!!

ALSO:

Everything else in the world is happening!!!

How to take this part of the course

  1. take a deep breath!

  2. believe in yourself!!!

  3. remember that your primary goal is present understanding, but a solid secondary goal is future understanding.

  4. try to stay on top of the work.

  5. don’t be afraid of office hours!!

Office hours!

Change from syllabus:

Office hours are Mondays 2-4pm in PLB 266 or on zoom.

Zoom Room: https://msu.zoom.us/my/emjos (Password: plantsrule)

Why am I doing this to you?

Many of you got into this business because of a love of nature, not a love of stats and computers.

Our goal is to try to translate your love of nature into a love of patterns, and give you the tools to do so!

Biology has become a driver of quantitative methods in STEM because interesting patterns are sublte and complicated.

Also, coding and stats is a very useful skill.

Simulating data

Simulations?

Simulating data

One of the most powerful tools that we’ll have in our statistics learning toolkit are simulations.

Simulations let you generate data that you know should look a certain way, so you can test your intuitions.

Simulations also let you do the same thing over and over and over.

Simulating data

thisClass = 1:35

thisClass
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35
sample(thisClass, size=1)
## [1] 31

Simulating data

sample(thisClass, size=1)
## [1] 15
sample(thisClass, size=1)
## [1] 14

Today’s lecture - Intro to Probability!

How do you define probability?


A measure of the likelihood that an event will occur

A long-term frequency

How often we expect an event to happen (‘degree of belief’)

How I use probability

Will I get to Park Place?


Which definition matches the probability of landing on Park Place?

How I use probability

When will I give birth?


Which definition matches the probability of giving birth on a specific day?

How I use probability

Who will win the election?


Which definition matches the probability of an election outcome?

Today’s rules of probability:

The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.

A simple event consists of a single experiment and has a single outcome

The sum of the probabilities of all possible outcomes of an event is equal to 1.

The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.

The probability of a complex event is the sum of the probabilities of its constitutive events.

The sum of the probabilities of all possible outcomes of an event is equal to 1.

Notation note

A set is an unordered set of objects.

S = {1,2,3,4,5}

S = {1,2,3,4,5} = {4,3,2,5,1}

The intersection of two sets is all the elements that appear in both sets.

{1,2,3} \(\cap\) {2,3,4} = {2,3}

The union of two sets is every element that appears in either set.

{1,2,3} \(\cup\) {2,3,4} = {1,2,3,4}

Probability: Simple Events

I have a set {1,2,3,4,5}

What’s the probability that, if I draw a number at random, I’ll get a 5?

This is a simple event -

  • it consists of a single experiment (drawing a random number)
  • and has a single outcome (the number drawn)

Probability: simulation is your friend!

help(sample)

Probability: simulation is your friend!

numbers <- c(1,2,3,4,5)
sample(numbers,1)
## [1] 3
sample(numbers,1)
## [1] 2
sample(numbers,1)
## [1] 2

Probability: simulation is your friend!

numbers <- c(1,2,3,4,5)
sample(numbers,1)


Working in teams, figure out a way to use a large-scale simulation (1e5 trials) to estimate the probability of picking a 5 in the following set:

{1,2,3,4,5}

  • hint - look at the documentation for sample()!

Probability: simulation is your friend!

numbers <- 1:5
nDraws <- 1e5
mySamples <- sample(numbers,nDraws,replace=TRUE)
length(which(mySamples==5))/nDraws
## [1] 0.201
1/5
## [1] 0.2

Probability - some terminology

We denote the probability of an event with P().

So, we can write the probability of a simple event e as:


\(\Large P(e)\)

as in:

\(\Large P(\text{drawing a }5) = \frac{1}{5}\)

Probability: simple events

\(\Large p(\text{drawing a }5) = \frac{1}{5}\)


What’s the probability of drawing a 4 out of {1,2,3,4,5}?


\(\Large P(4) = P(5) = P(1) = P(2) = P(3) = \frac{1}{5}\)

First Axiom of Probability:


The Law of Total Probability:

\(\Large \sum\limits_{i=1}^n P(e_{i}) = 1\)

The sum of the probabilities of all the outcomes of an event is 1

\(\Large P(4) + P(5) + P(1) + P(2) + P(3) = 1\)

Rules of probability: recap

The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.

A simple event consists of a single experiment and has a single outcome

The sum of the probabilities of all possible outcomes of an event is equal to 1.

Probability: shared events

A shared event is the simultaneous occurrence of simple events

The probability of a shared event is denoted P(A \(\cap\) B)

(read as probability of the intersection of simple events A and B)

E.g., probability of drawing 2 numbers and getting a 4 AND a 5

the probability of a shared event is the product of the probabilities of its constitutive simple events

[As long as the simple events are independent, meaning that the outcome of one event does not depend on the outcome of another.]

Probability: simulating shared events


numbers <- 1:5

What’s the probability you draw the sequence (4,5), in that order?

Probability: simulating shared events

nDraws <- 1e5
## sample two dice rolls for nDraws times.
two.draws <- replicate(nDraws,
                    sample(numbers,2,replace=TRUE),
              simplify=FALSE)
## go through list, which ones are equal to 2
is45 <- lapply(two.draws, function(x){
  sum(x==c(4,5))==2
})

#count them
prop45 <- sum(unlist(is45))/nDraws

prop45
## [1] 0.0402

Probability: shared events

the probability of a shared event is the product of the probabilities of the simple events that make up the shared event:

\(\Large p(4 \text{ in first draw}) = \frac{1}{5}\) \(\Large p(5 \text{ in second draw}) = \frac{1}{5}\) \(\Large p(\text{sequence } (4,5) ) = \frac{1}{5} \times \frac{1}{5} = 0.04\)

Rules of probability: recap

The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.

A simple event consists of a single experiment and has a single outcome

The sum of the probabilities of all possible outcomes of an event is equal to 1.

The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.

The probability of a complex event is the sum of the probabilities of its constitutive events.

Probability: complex events

Complex events are composites of simple events


e.g., rolling two dice and summing their values


The probability of a complex event is the sum of the probabilities of the simple events that make up the complex event

Probability: simulating complex events

numbers <- c(1,2,3,4,5)


If I draw two numbers at once, what’s the probability that they sum to 6?

Return to teams and update your simulations to test this.

two.draws <- numeric(1e5)
for(i in 1:1e5){
    two.draws[i] <- sum(
                        sample(1:5,2,replace=TRUE)
                    )
}

length(which(two.draws==6))/1e5
## [1] 0.2

Probability: simulating complex events

numbers <- c(1,2,3,4,5)


If I draw two numbers at once, what’s the probability that they sum to 6?

two.draws <- replicate(1e5,
                    sum(
                        sample(1:5,2,replace=TRUE)
                    )
             )
length(which(two.draws==6))/1e5
## [1] 0.201

Probability: complex events

If I draw two numbers at once, what’s the probability that they sum to 6?

The total state space is 5 \(\times\) 5 outcomes, each of which is equi-probable (1/25).

And can get a 6 with 5 outcomes:

{(1,5),(2,4),(3,3),(4,2),(5,1)}

So the probability of drawing two numbers that sum to 6 is the sum of the probabilities of the simple events that make up that complex event.

\(\large P(a + b = 6) = \frac{1}{25} + \frac{1}{25} + \frac{1}{25} + \frac{1}{25} + \frac{1}{25} = \frac{1}{5}\)

Probability: complex events

In general,

\(\large P(E_{1} + E_{2}) = P(E_{1}) + P(E_{2})\)

If \(E_{1}\) and \(E_{2}\) are mutually exclusive events

(This is also an axiom)

Rules of probability: recap

The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.

A simple event consists of a single experiment and has a single outcome

The sum of the probabilities of all possible outcomes of an event is equal to 1.

The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.

The probability of a complex event is the sum of the probabilities of its constitutive events.

Probability: sparkly shoes

You record the hair color and shoe phenotype of all the kids in a daycare.

plain sparkly
black hair 48 10
brown hair 35 7


If you pick a kid at random, what is the probability they have black hair and sparkly shoes?

Daycare example: simple event

plain sparkly
black hair 48 10
brown hair 35 7


If you pick a kid at random, what is the probability they have black hair and sparkly shoes?

0.1

Daycare example: complex event

plain sparkly
black hair 48 10
brown hair 35 7


What’s the probability of having black hair?


Note that there are now two categories of kids with black hair: the plain-shoed and the sparkle-shoed.

Daycare example: complex event

plain sparkly
black hair 48 10
brown hair 35 7


What’s the probability of having black hair?

p(black hair) = p(black hair AND plain shoes) + p(black hair AND sparkly shoes) = 0.58

Daycare example: shared event

What’s the probability of picking two kids that both have black hair and sparkly shoes?


plain sparkly
black hair 48 10
brown hair 35 7


p(black hair and sparkly shoes) x p(black hair and sparkly shoes) = 0.01

Rules of probability: thus far

  1. The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.

  2. The probability of a complex event is the sum of the probabilities of its constitutive events.

  3. The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.

  4. The sum of the probabilities of all possible outcomes of an event is equal to 1.

Conditional probabilities

plain sparkly total
black hair 48 10 58
brown hair 35 7 42
total 83 17 100




- probability of sparkly shoes IF you have black hair?

Conditional probabilities

  • a conditional probability is the probability of one outcome conditional on another

  • written as p(A | B), read as “probability of A given B”

  • probability of the intersection of A and B, divided by the probability of B

\(\Huge p(A|B) = \frac{p(A ~ \cap ~ B)}{p(B)}\)

Conditional probabilities

\(\Huge p(A|B) = \frac{p(A ~ \cap ~ B)}{p(B)}\)

Conditional probabilities

plain sparkly total
black hair 48 10 58
brown hair 35 7 42
total 83 17 100



- probability of sparkly shoes IF you have black hair?

Conditional probabilities

plain sparkly total
black hair 48 10 58
brown hair 35 7 42
total 83 17 100



- probability of sparkly shoes IF you have black hair?

-p(A | B) = p(A \(\cap\) B)/p(B) =

Conditional probabilities

plain sparkly total
black hair 48 10 58
brown hair 35 7 42
total 83 17 100



- probability of sparkly shoes IF you have black hair?

-p(A | B) = p(A \(\cap\) B)/p(B) = 10 / 58 = 0.172

Conditional probabilities

  • recall that if A and B are independent, p( A \(\cap\) B) = p(A) * p(B)

  • so if A and B are independent:

\(\Large \begin{aligned} p(A \mid B) &= \frac{p(A ~ \cap ~ B)}{p(B)} \\ \\ &= \frac{p(A)p(B)}{p(B)} = p(A) \end{aligned}\)

Conditional probabilities

plain sparkly total
black hair 48 10 58
brown hair 35 7 42
total 83 17 100


Does a kid’s hair color affect the probability that their parents will get them a sweet pair of sparkly shoes?

p(sparkle shoes | black hair) = 0.172

p(sparkle shoes) = 0.17

Rules of probability: recap

  1. The probability of an outcome is the number of times the outcome occurs divided by the total number of trials.

  2. The probability of a complex event is the sum of the probabilities of its constitutive events.

  3. The probability of a shared event is the product of the probabilities of its constitutive events, so long as they’re independent.

  4. The sum of the probabilities of all possible outcomes of an event is equal to 1.

  5. A conditional probability is probability of one outcome conditional on another, and is equal to the probability of the intersection of the outcomes, divided by the probability of the condition

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - p(the first roll is a “1”)?

A - 7/100
B - (7+6+8+4)/100
C - (7+5+7+9)/100
D - 7/(7+5+7+9)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - p(the first roll is a “1”)?

A - 7/100
B - (7+6+8+4)/100
C - (7+5+7+9)/100
D - 7/(7+5+7+9)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - Expected p(rolling (2,2))?

A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{1}{4} + \frac{1}{4}\)
C - \({(\frac{1}{4}})^{4}\)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - Expected p(rolling (2,2))?

A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{1}{4} + \frac{1}{4}\)
C - \({(\frac{1}{4}})^{4}\)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - Expected p(sum of both rolls is greater than 3)?

A - \(\frac{1}{4} \times \frac{1}{4} \times \frac{13}{16}\)
B - \(\frac{13}{16}\)
C - \(\frac{1}{2}\)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - Expected p(sum of both rolls is greater than 3)?

A - \(\frac{1}{4} \times \frac{1}{4} \times \frac{13}{16}\)
B - \(\frac{13}{16}\)
C - \(\frac{1}{2}\)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - Expected p(sum of both rolls > 3 given first roll is a 1)?

A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{2}{16}\)
C - \(\frac{1}{2}\)

You roll a pair of 4-sided dice 100 times in a row and record the combination of numbers you get (1st roll in row, 2nd roll in col):

1 2 3 4
1 7 6 8 4
2 5 4 4 8
3 7 9 8 6
4 9 5 6 4

POLL - Expected p(sum of both rolls > 3 given first roll is a 1)?

A - \(\frac{1}{4} \times \frac{1}{4}\)
B - \(\frac{2}{16}\)
C - \(\frac{1}{2}\)